5 - Rechnerarchitektur [ID:10873]

50 von 650 angezeigt

Yes, the treatment of data hazards and structure hazards and especially with data hazards,

the write after write and the write after write and write after read.

Yes, we have learned the two solutions there over the last two weeks,

namely the ones based on the Thomas-Holo architecture and the scoreboards.

Okay. And yes, today I come to a new important topic, namely the cache architectures.

There are a few things that we are going to take up. We have already heard them in the middle of the day.

I will repeat them again and then we will go into depth and will also look at how, for example,

the replacement strategies are going on and update strategies, i.e. the things that could no longer be treated in the basic lecture.

Yes, so in addition to the processor, the computer system is very important,

the memory system for the performance and the cost of a computer.

And in the ideal situation it would be like this, yes, I have sufficient capacity,

so no problem in terms of the memory capacity. The access time can always keep up with the processing speed of the processor.

So the processor, in the speed at which the processor works, I can always access the memory.

But that is not possible for economic and technical reasons, it is not feasible.

I can't build such large registers. With registers it would be possible

that I could access it in a speed of one beat per beat.

But that would be possible in terms of latency, but unfortunately I couldn't tell big capacities with it.

Conversely, with the RAM I can tell big capacities,

but I just can't get the information there in one beat per beat cycle.

So what is the solution? The solution is the installation, which the computer architects have done,

a multi-level memory hierarchy. And each level is smaller, faster and more expensive per byte than the next level,

so more expensive measured in euros per bit, which I can store there.

Yes, and this is the so-called inclusion condition.

Each memory of a hierarchy level contains an section, i.e. the larger hierarchy level,

and each has an section from the smaller one. Or vice versa, the next larger hierarchy level is usually always included in the next hierarchy level.

Whether the data is consistent is something different, but at least the variables.

I have a variable that is defined in the memory and then there is a picture in the cache and possibly in the register.

Exactly. And this is the cache, this part of this memory hierarchy.

It starts with the registers that are very close to the processor and then I have these different caches,

i.e. this fast intermediate memory and by now there are up to four levels.

L1, L2, L1, L2 stands for L2, L3, L4.

Then comes the main memory and then the hard drive and that is also possibly cached

and it is to be expected that in the next few years this hierarchy will be expanded between the main memory and the hard drive.

Probably such storage class memories will come, i.e. such non-fleeable memory,

half-chip memory, which are faster than the hard drive and maybe not as fast in the first implementations as the main memory,

but deliver more capacity.

And so the techniques that are used there, you may have heard of them,

resistive RAM, phase change memories, these are basically solutions that should solve the flash memory,

which you now have in cell phones and cameras or here as a space replacement in the PCs.

Well, yes, and then we have the hard drive, optical mass storage and band storage in the background.

Here the memory hierarchy is shown in a table.

So we start with the processor registers, we access them in one cycle.

Here are typical sizes for the capacity, exactly less.

Although, you always have to distinguish a bit, so if you say, well, in the back, it would be nice if the 256 processor register had much less.

Yes, that's true, they have less that they can address, but in the background they work even more.

These are such shadow registers that are used there, which are not accessible from the program at first, but which then use the hardware.

Yes, then comes the primary cache, one to three-stage cycles, then the secondary cache,

I have now installed the tertiary and quadruple cache with one to six-stage cycles.

And you can see the ratio of access times to the main memory, 40 to 1, in the primary cache and 10 to 1 in the secondary cache.

And that's of course quite a lot, so that means we're already on it.

Teil einer Videoserie :

Rechnerarchitektur

Presenters

Prof. Dr. Dietmar Fey

Zugänglich über

Offener Zugang

Dauer

01:29:57 Min

Aufnahmedatum

2017-11-20

Hochgeladen am

2019-04-30 22:09:04

Sprache

de-DE

Die Vorlesung baut auf die in den Grundlagen der Rechnerarchitektur und -organisation vermittelten Inhalte auf und setzt diese mit weiterführenden Themen fort. Es werden zunächst grundlegende fortgeschrittene Techniken bei Pipelineverarbeitung und Cachezugriffen in modernen Prozessoren und Parallelrechnern behandelt. Ferner wird die Architektur von Spezialprozessoren, z.B. DSPs und Embedded Prozessoren behandelt. Es wird aufgezeigt, wie diese Techniken in konkreten Architekturen (Intel Nehalem, GPGPU, Cell BE, TMS320 DSP, Embedded Prozessor ZPU) verwendet werden. Zur Vorlesung werden eine Tafel- und eine Rechnerübung angeboten, durch deren erfolgreiche Beteiligung abgestuft mit der Vorlesung 5 bzw. 7,5 ECTS erworben werden können. In den Tafelübungen werden die in der Vorlesung vermittelten Techniken durch zu lösende Aufgaben vertieft. In der Rechnerübung soll u.a. ein einfacher Vielkern-Prozessor auf Basis des ZPU-Prozessors mit Simulationswerkzeugen aufgebaut werden. Im Einzelnen werden folgende Themen behandelt:

Organisationsaspekte von CISC und RISC-Prozessoren
Behandlung von Hazards in Pipelines
Fortgeschrittene Techniken der dynamischen Sprungvorhersage
Fortgeschritten Cachetechniken, Cache-Kohärenz
Ausnutzen von Cacheeffekten
Architekturen von Digitalen Signalprozessoren
Architekturen homogener und heterogener Multikern-Prozessoren (Intel Corei7, Nvidia GPUs, Cell BE)
Architektur von Parallelrechnern (Clusterrechner, Superrechner)
Effiziente Hardware-nahe Programmierung von Mulitkern-Prozessoren (OpenMP, SSE, CUDA, OpenCL)
Leistungsmodellierung und -analyse von Multikern-Prozessoren (Roofline-Modell)

Empfohlene Literatur

Patterson/Hennessy: Computer Organization und Design
Hennessy/Patterson: Computer Architecture - A Quantitative Approach
Stallings: Computer Organization and Architecture
Märtin: Rechnerarchitekturen

Einbetten

Wordpress FAU Plugin

 https://www.fau.tv/clip/id/10873

iFrame

<iframe src="https://api.video.uni-erlangen.de/services/oembed/?url=https://www.fau.tv/clip/id/10873&format=iframe&maxwidth=1280&maxheight=720" width="1280" height="720"seamless allowfullscreen style="border: 0; padding: 0; margin: 0;overflow: hidden;"></iframe>

Herunterladen

Video

Audio

Per RSS abonnieren